A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus

نویسندگان

  • Deniz Zeyrek
  • Bonnie L. Webber
چکیده

This paper describes first steps towards extending the METU Turkish Corpus from a sentence-level language resource to a discourse-level resource by annotating its discourse connectives and their arguments. The project is based on the same principles as the Penn Discourse TreeBank (http://www.seas.upenn.edu/~pdtb) and is supported by TUBITAK, The Scientific and Technological Research Council of Turkey. We first present the goals of the project and the METU Turkish corpus. We then describe how we decided what to take as explicit discourse connectives and the range of syntactic classes they come from. With representative examples of each class, we examine explicit connectives, their linear ordering, and types of syntactic units that can serve as their arguments. We then touch upon connectives with respect to free word order in Turkish and punctuation, as well as the important issue of how much material is needed to specify an argument. We close with a brief discussion of current plans.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

METU Turkish Discourse Bank Browser

In this paper, the METU Turkish Discourse Bank Browser, a tool developed for browsing the annotated annotated discourse relations in Middle East Technical University (METU) Turkish Discourse Bank (TDB) project is presented. The tool provides both a clear interface for browsing the annotated corpus and a wide range of search options to analyze the annotations.

متن کامل

Creating an Annotated Tamil Corpus as a Discourse Resource

We describe our efforts to apply the Penn Discourse Treebank guidelines on a Tamil corpus to create an annotated corpus of discourse relations in Tamil. After conducting a preliminary exploratory study on Tamil discourse connectives, we show our observations and results of a pilot experiment that we conducted by annotating a small portion of our corpus. Our ultimate goal is to develop a Tamil D...

متن کامل

The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic

We present the first effort towards producing an Arabic Discourse Treebank, a news corpus where all discourse connectives are identified and annotated with the discourse relations they convey as well as with the two arguments they relate. We discuss our collection of Arabic discourse connectives as well as principles for identifying and annotating them in context, taking into account properties...

متن کامل

The Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations

In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an exten...

متن کامل

TDB 1.1: Extensions on Turkish Discourse Bank

In this paper we present the recent developments on Turkish Discourse Bank (TDB). We first summarize the resource and present an evaluation. Then, we describe TDB 1.1, i.e. enrichments on 10% of the corpus (namely, added senses for explicit discourse connectives and new annotations for implicit relations, entity relations and alternative lexicalizations). We explain the method of annotation and

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008